

#### 香港中文大學 The Chinese University of Hong Kong

CENG3430 Rapid Prototyping of Digital Systems

## Lecture 07:

# Rapid Prototyping (I) – Integration of ARM and FPGA

**Ming-Chang YANG** 



## High-level vs Hardware Description Lang.







#### **Outline**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - ① IP Block Creation & AXI Interfacing
      - ② IP Integration
      - ③ HDL Wrapper
      - ④ Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - S ARM Programming
      - © Launch on Hardware

## **Zynq Features**



- The defining features of Zynq:
  - Processing System (PS): Dual-core ARM Cortex-A9 CPU
  - Programmable Logic (PL): Equivalent traditional FPGA
  - Advanced eXtensible Interface (AXI): High bandwidth, low latency connections between PS and PL.
    - PS and PL can each be used for what they do best,
       without the overhead of interfacing between PS and PL.



## Prototyping with FPGA: PL Only



- However, so far, our designs are implemented <u>only</u> using the programmable logic of Zynq with VHDL.
  - It is usually hard to implement complicated logic or software.





#### **PS** for Software:

general purpose sequential tasks, operating system, GUIs, user applications, etc.

PS-side

I/Os

## AXI:

hardware interfacing between PS & PL

#### **PL for Hardware:**

intensive data computation, PLside peripheral communication, etc.



## Zynq-7000 SoC





## **Design Flow with Zynq**





## Key: Hardware/Software Partitioning



PS and PL can each be used for what they do best.



## Prototyping Styles with Zynq ZedBoard



Xilinx SDK (C/C++)

Bare-metal Applications

**Applications** 

**SDK** (Shell, C, Java, ...)

Board Support Package

Operating System

Process System (PS)

Board Support Package

software

Xilinx Vivado (HDL)

Programmable Logic Design

Style 1) FPGA (PL)

VHDL or Verilog
Programming
(Lec. 01~06 & 10)

Hardware Base System

Style 2) ARM + FPGA

ARM Programming & IP Block Design (Lec. 07 & 09)

Hardware Base System

Style 3) Embedded OS

Shell Script & sysfs EMIO GPIO (*Lec. 08*)

hardware

Program Logic (PL)

#### **Outline**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - IP Block Creation & AXI Interfacing
      - ② IP Integration
      - 3 HDL Wrapper
      - 4 Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - **S** ARM Programming
      - 6 Launch on Hardware

### Integration of ARM and FPGA



- To integrate ARM and FPGA, we need to do:
  - IP Block Design on Xilinx Vivado using HDL
  - ARM Programming on Xilinx SDK using C/C++



## Intellectual Property (IP) Block



- IP Block (or IP Core): a hardware specification used to configure the logic resources of an FPGA.
- IP is crucial in FPGA and embedded system designs.
  - IP allows system designers to pick-and-choose from a wide array of pre-developed design blocks.
  - IP saves development time, as well as provides guaranteed functionality without the need for extensive testing.



#### Hard vs. Soft IP Block



- Hard IP Block: Permit <u>no</u> realistic method of modification by end users.
  - Firm IP Block: An IP block already <u>undergone full synthesis</u>, place and route design flow for a targeted FPGA/ASIC.
- Soft IP Block: <u>Allow</u> end users to customize the IP by controlling the synthesis, place and route design flow.
  - The highest level of soft IP block customization is available when the source HDL code is provided.
    - Such as the C source code is provided.
  - Soft IP block can be also provided as a gate-level netlist.
    - Such as the compiled assembly code listing is provided.

#### **Sources of IP Block**



- IP Libraries: Xilinx provides an extensive catalogue of soft IP cores for the Zynq-7000 AP family.
  - Ranging from building blocks (such as FIFOs and arithmetic operators) up to fully functional processor blocks.
- Third-party IP is also available, both commercially and from the open-source community.
- IP Creation: The final option is to create by yourself.
  - The most traditional method of IP creation is for it to be developed in HDLs (such as VHDL or Verilog).
  - Recently, other methods of IP creation have also been introduced to Vivado, such as High Level Synthesis (HLS).

## Steps of ARM-FPGA Integration



- PART 1: IP Block Design (Software: Xilinx Vivado)
  - ① Create and Package the PL logic blocks into intellectual property (IP) block with AXI4 Interface.
    - With AXI4, data can be exchanged via shared 32-bit registers.
  - ② Integrate the <u>customized</u> (or pre-developed) IP block with ZYNQ7 Processing System (PS) via IP Block Design.
    - Vivado can auto-connect IP block and ARM core via AXI interface.
  - ③ Create HDL Wrapper and Add Constraints to automatically generate the HDL code (VHDL or Verilog).
  - Generate and Program Bitstream into the board.
- PART 2: ARM Programming (Software: Xilinx SDK)
  - ⑤ Design the bare-metal application in C/C++ language.
  - **© Launch on Hardware (GDB):** Run the code on ARM core.

#### **Outline**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - IP Block Creation & AXI Interfacing
      - ② IP Integration
      - 3 HDL Wrapper
      - 4 Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - **S** ARM Programming
      - 6 Launch on Hardware

#### Recall: What we have done in Lab06





entity sevenseg is

- *Task1*: *Display the input number (XY) in hexadecimal*
- Task2: Count down from the input number (XY) to (00)

## Recall: Lab06 (1/2)



```
-- generate 1ms and 1s clocks
process (clk)
begin
  if rising edge (clk) then
    if (s count = \frac{1}{49}999999) then
      s pulse <= not s pulse;
      s count <= 0;
    else
      s count <= s count + 1;
    end if:
    if (ms count = 49999) then
      ms pulse <= not ms_pulse; .....
      ms count <= 0;
    else
      ms count <= ms count + 1;
    end If;
  end if:
end process;
-- read button (combinational)
process (btn)
begin
  if rising edge (btn) then
    counter en <= not counter en;
  end if;
end process;
```

```
-- count down
process (s pulse)
  begin
    if rising edge (s pulse) then
      if (counter en = '1') then
        if (counter=0) then
          counter <= to integer (
                      unsigned(switch));
        else
          counter <= counter - 1;
       end if;
      end if:
    counter vec <= std logic vector(
                    to unsigned (counter, 8));
    end if;
  end process;
  -- update the seven segment display
····process (ms pulse)
    ... -- see the next page
  end process;
  -- output ssd (combinational)
  process (digit)
   ... -- see the next page
  end process;
```

## Recall: Lab06 (2/2)



```
-- generate 1ms and 1s clocks
process (clk)
begin
  if rising edge (clk) then
    if (s count = \frac{49}{9999999}) then
      s pulse <= not s pulse; process (ms pulse)
      s count \leq 0;
    else
      s count <= s count + 1;
    end if:
    if (ms count = 49999) then
      ms count <= 0;
    else
      ms count <= ms count + 1;
    end If;
  end if;
end process;
-- read button (combinational)
process (btn)
begin
  if rising edge (btn) then
    counter en <= not counter en; process (digit)
  end if;
end process;
```

```
-- count down
                          process (s pulse)
                              ... -- see the previous page
                            end process;
                            -- update the seven segment display
                            begin
                              if ms pulse='1' then
                                if (counter en = '1') then
                            digit <= counter vec(7 downto 4);
                                else
ms pulse <= not ms pulse; digit <= switch (7 downto 4);
                               end if;
                              else
                                if (counter en = '1') then
                            digit <= counter vec(3 downto 0);
                                else
                            digit <= switch(3 downto 0);</pre>
                               end if;
                              end if;
                               ssdcat <= ms pulse; -- select display</pre>
                            end process;
                             -- output ssd (combinational)
                            begin
                               case digit is
                                when "0000" => ssd <= "11111110";
                              end case;
                                                                    20
                             end process;
```

## Hardware vs. Software Stopwatch



- In Lab 06, what we've done is a hardware stopwatch in which the FPGA (PL) is responsible for both:
  - Hardware: Interfacing with the user via switch and btn.
  - Software: Generating the time to be shown on ssd and dealing with different user inputs.
- In Lab 08, we will design a software stopwatch through ARM-FPGA integration as follows:
  - Hardware: FPGA (PL) is <u>only</u> responsible for hardware interfacing with the user via <u>switch</u>, <u>btn</u>, and <u>led</u>.
  - Software: ARM (PS) is responsible for generating the values to be shown on ssd and led, and dealing with different user inputs or events.
    - By ARM programming, an even more complicated control logic can be realized in an easier way.

## Lab08: Design Specification (1/2)





| Register | AXI Port / Signal Name | Related Port on FPGA | Function                               |
|----------|------------------------|----------------------|----------------------------------------|
| 0        | s_slv_reg0 / slv_reg0  | ssd                  | SSD output                             |
| 1        | s_slv_reg1 / slv_reg1  | led                  | LED output                             |
| 2        | s_slv_reg2 / slv_reg2  | switch               | Switch input                           |
| 3        | s_slv_reg3 / slv_reg3  | btn                  | Button input                           |
| 4        | s_slv_reg4 / slv_reg4  | timer                | Timer signal input (generated by FPGA) |
| N/A      | N/A                    | clk                  | 100MHz clock signal from PL            |
| N/A      | N/A                    | ssdcat               | 7-segment digit selection              |
| N/A      | N/A / timer            | N/A                  | 1kHz clock divided from clk            |

## Lab08: Design Specification (2/2)



- We need <u>five</u> AXI slave registers (s\_slv\_reg0~4) for exchanging data between ARM and FPGA:
  - The ARM processor reads the input value from the switches and the buttons, as well as a 1 KHz timer signal.
    - •s slv reg2: Switch input
    - s slv reg3: Button input
    - s\_slv\_reg4: 1 KHz clock divided from 1 MHz clk of PL.
  - The C program runs on the ARM processor, calculates the stopwatch's time based on the data input, generates values to be displayed on the 7-segment displays and the LEDs, and sends the data back to the FPGA for display.
    - •s slv reg0: SSD output
    - •s slv reg1: Led output

#### **Outline**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - ① IP Block Creation & AXI Interfacing
      - ② IP Integration
      - ③ HDL Wrapper
      - ④ Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - S ARM Programming
      - 6 Launch on Hardware

#### **1** IP Block Creation



#### IP Block Creation in HDL

- Hardware description languages (HDLs), such as VHDL and Verilog, are specialized programming languages.
  - HDLs describe the operation and structure of digital circuits.
- The ability to create IP cores in HDL allows you the maximum control over the functionality of your peripheral.

#### IP Block Creation in Vivado High-Level Synthesis

- Vivado HLS is a tool provided by Xilinx.
- HLS is capable of <u>converting C-based designs into RTL</u> <u>design files for implementation</u> of Xilinx All Programmable devices (see Lecture 09).
  - C-based Designs: C, C++, or SystemC
  - RTL Designs: VHDL, Verilog, or SystemC

#### **1** IP Block Creation



 According to our design specification, we need to have five AXI registers for exchanging data:



- Two .vhd files will be generated automatically:
  - stopwatch\_controller\_v1\_0.vhd: This file instantiates the AXI-Lite interface and contain the stopwatch functionality.
  - stopwatch\_controller\_v1\_0\_S00\_AXI.vhd: This file contains only the AXI-Lite bus functionality.

## **① AXI Interfacing**



- IP blocks designed in HDL are communicated the processing system (PS) via an AXI interface.
  - Vivado will *auto-create* the following source files for editing:
    - <peripheral>\_<version>.vhd: the top-level file defines the design interface, lists connections and ports for the AXI interface, as well as implements the functionality of user-defined entities.
    - <peripheral>\_<version>\_<AXI\_instance>.vhd: describes an instance of AXI interface for this IP block for integrating into PS.



Where:

<peripheral> is the name of
the IP to be created.

<version> is the current
version, i.e. v1\_0.

<a href="#"><AXI\_instance</a> is the AXI4 master (M) or slave (S) interface instance, i.e. M00\_AXI or S00\_AXI. (There may be multiple instances per peripheral).

## stopwatch\_controller\_v1\_0\_S00\_AXI.vhd (1/2)

 Vivado will auto-declare slave registers (as internal signals) based on the number entered by users:

```
---- Signals for user logic register space example
---- Number of Slave Registers 5
signal slv_reg0: std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
signal slv_reg1: std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
signal slv_reg2: std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
signal slv_reg3: std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
signal slv_reg4: std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
```

But we still need to define ports for these registers:

```
-- Users to add ports here
s_slv_reg0: out std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
s_slv_reg1: out std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
s_slv_reg2: in std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
s_slv_reg3: in std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
s_slv_reg4: in std_logic_vector(C_S_AXI_DATA_WIDTH-1 downto 0);
-- User ports ends
```

## stopwatch\_controller\_v1\_0\_S00\_AXI.vhd (2/2)

 Then we interconnect the internal slave registers and the user-defined ports:

- Besides, we also need to disable some autogenerated "write logic" for slv\_reg2 ~ slv\_reg4 (i.e., switch, button, and timer), since:
  - Their values would be read-only from the FPGA, and
  - The user application (i.e., stopwatch.c) <u>cannot</u> change their values.

## stopwatch\_controller\_v1\_0.vhd (1/3)



- Next, we complete the stopwatch functionality:
  - 1) We first define ports in entity of stopwatch\_controller\_v1\_0:

```
-- Users to add ports here

clk: in std_logic;

btn: in std_logic_vector(4 downto 0);

switch: in std_logic_vector(7 downto 0);

ssdcat: out std_logic;

ssd: out std_logic_vector(6 downto 0);

led: out std_logic_vector(7 downto 0);

-- User ports ends
```

#### 2) The following changes should be also made:

- Add generic parameters (if any),
- Add ports in component of stopwatch\_controller\_v1\_0\_S00\_AXI,
  - Since we define new ports for the five registers in \*AXI.vhd
- Add other user-defined components (if any), and
- Add required internal signals for user logic and functionality.

(Note: Please refer to the lab sheet for more detailed instructions.)

## stopwatch\_controller\_v1\_0.vhd (2/3)



- Next, we complete the stopwatch functionality:
  - 3) Then we create and connect stopwatch\_AXI and ssd\_controller components in the architecture body of stopwatch\_controller\_v1\_0 as follows:

```
stopwatch_controller_v1_0_S00_AXI
...
port map (
    -- Users to add port map
    s_slv_reg0 => slv_reg0,
    s_slv_reg1 => slv_reg1,
    s_slv_reg2 => slv_reg2,
    s_slv_reg3 => slv_reg3,
    s_slv_reg4 => timer,
    -- User port map ends
...
```

```
-- Add user logic here

ssd_controller

generic map (

   cat_period => C_MS_LIMIT )

port map (

   clk => clk,

   value => ssd_value,

   ssd => ssd,

   ssdcat => ssdcat );
```

Note: VHDL allows the designer to parametrize the entity during the component instantiation via **generic map**. It is used here to indicate the value for counting 1 ms in ZedBoard.

## stopwatch\_controller\_v1\_0.vhd (3/3)



- Next, we complete the stopwatch functionality:
  - 4) Last, we implement the stopwatch logic in the architecture body of stopwatch\_controller\_v1\_0 as follows:

```
ssd_value <= slv_reg0 (7 downto 0); ← ssd_controller will take over the ssd display!
led <= slv reg1(7 downto 0); ← LED output</pre>
slv reg2 <= (C S00 AXI DATA WIDTH-1 downto 8 => '0') & switch; \( \subseteq \text{Switch input} \)
slv reg3 <= (C S00 AXI DATA WIDTH-1 downto 5 => '0') & btn; 
Button input
process(clk, ms count, timer) begin
  if (clk'event and clk='1') then
                                                              stopwatch controller v1 0.vhd
                                                     IP
     if (ms count = C MS LIMIT-1) then
                                                                                  *S00 AXI.vhd
                                                               ssd.vhd
       ms count <= (OTHERS => '0');
                                                                          slv reg0
                                                  ssd
                                                                                     s slv reg0
                                                                   (value)
       timer <= timer + 1; ← Timer input
                                                [0...6]
                                                                SSD
                                               ssdcat
                                                                                      AXI
     else
                                                              Controller
                                                         clk
                                                                                    Interface
       ms count <= ms count + 1;
                                                                          slv reg1
                                                  led
                                                                                     s slv reg1
                                                [0...7]
     end if:
                                                                          slv reg2
                                               switch
                                                                                     s slv reg2
                                                [0...7]
  end if;
                                                                          slv reg3
                                                 btn
                                                                                     s slv_reg3
end process;
                                                [0...4]
                                                                           timer
                                                 clk
                                                                                     s slv reg4
-- User logic ends
                                                              Timer (ms)
```

#### ssd.vhd



```
-- count 1 ms (generic: cat period) -- assign digit based on sel
                                      digit <= value(7 downto 4) when sel='1'
process (clk, count)
                                               else value (3 downto 0);
begin
                                      -- display digit on ssd
  if (clk'event and clk='1') then
                                      process(clk, digit) begin
    if (count = cat period-1) then
                                        if (clk'event and clk='1') then
      count \leq 0;
      ms pulse <= '1';
                                          case digit is
                                            when x"0" =>
    else
                                                            ssd <= b"1111110";
                                                            ssd <= b"0110000";
      count <= count + 1;
                                            when x"1" =>
                                            when x"2" =>
      ms pulse <= '0';
                                                            <u>ssd</u> <= b"1101101";
                                                            <u>ssd</u> <= b"1111001";
  end if;
                                            when x"3" =>
end if;
                                            when x''4'' =>
                                                            ssd <= b"0110011";
                                            when x"5" =>
                                                            ssd <= b"1011011";
end process;
-- negate sel every 1 ms
                                            when x"6" =>
                                                            <u>ssd</u> <= b"1011111";
                                            when x"7" =>
                                                            ssd <= b"1110000";
process (clk, sel, ms pulse)
                                            when x"8" =>
                                                            <u>ssd</u> <= b"1111111";
begin
  if (clk'event and clk='1') then
                                            when x''9'' =>
                                                            ssd <= b"1110011";
                                            when x"a" =>
    if (ms pulse = '1') then
                                                            ssd <= b"1110111";
      sel <= not sel;
                                            when x"b" =>
                                                            ssd <= b"0011111";
                                                            <u>ssd</u> <= b"1001110";
                                            when x''c'' =>
    else
                                            when x"d" =>
                                                            <u>ssd</u> <= b"0111101";
      sel <= sel;
                                            when x"e" =>
                                                            ssd <= b"1001111";
    end if:
                                                            ssd <= b"1000111";
  end if;
                                            when x''f'' =>
                                            when others \Rightarrow ssd \Leftarrow b"0000000";
end process;
-- output ssdcat
                                          end case;
                                                             ↑ SSD output
end if;
```

## ① IP Packager and IP Catalog



 Vivado IP Packager enables developers to quickly prepare IP for integration in the Vivado IP Catalog.

 Once the IP is selected in a Vivado project, the IP is treated like any other IP module from the IP Catalog.



IP Development Flow

IP Use Flow

## **② IP Integration**



 Vivado IP Integrator provides a graphical "canvas" to configure IP blocks in an automated development flow.



## **Block Design for Stopwatch System**



 Vidado will help us to auto-connect the stopwatch and the ARM processor through AXI interface.



## ③ HDL Wrapper & ④ Generate Bitstream

- Vivado can help to create a top-level HDL Wrapper.
  - This will automatically generate the VHDL code for the whole block design.
- With a constraint file, the Bitstream can be generated and downloaded into the targeted board.







#### **Outline**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - ① IP Block Creation & AXI Interfacing
      - ② IP Integration
      - 3 HDL Wrapper
      - Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - S ARM Programming
      - ⑥ Launch on Hardware

## **⑤ ARM Programming**



- We need some header files: one for controlling the ZYNQ processor in general, and the other to bring in items specific to our stopwatch controller:
  - #include "xparameters.h"
  - #include "stopwatch controller.h"
- Then, we can make some simple names for the addresses of the registers in our IP block.
  - #define **SW\_BASE** XPAR\_STOPWATCH\_CONTROLLER\_0\_S00\_AXI\_BASEADDR
  - #define SSD\_ADDR STOPWATCH\_CONTROLLER\_S00\_AXI\_SLV\_REG0\_OFFSET
- We are creating a bare metal software program.
  - There is *nothing but our program* running on the ARM.
  - Thus, our program should really never exit (How? By loop!).

## Key: Interfacing via Registers



```
stopwatch.c
while (1) ← Infinite loop
        /* INPUT */
        /* btn & switch & time */
        btn in = STOPWATCH CONTROLLER mReadReg (SW BASE, BTN ADDR);
                                           ↑ read register SWITCH_ADDR
        /* time */
        timer in = STOPWATCH CONTROLLER mReadReg (SW BASE, TIMER ADDR);
        u32 time display;
        ··· 

User logic for determining the time to be displayed on LED and SSD
        /* OUTPUT */
        /* led & ssd */
        led out = time display;
        STOPWATCH CONTROLLER mWriteReg (SW BASE, LED ADDR, led out);
                                  ↑ write register SSD_ADDR
        /* FEEDBACK */ ← Like the states for FSMs
        btn in prev = btn in;
        switch in prev = switch in;
```

## **© Launch on Hardware (GDB)**



- Finally, after the software stopwatch (.c) is ready, you
  can run it on ARM by Launch on Hardware (GDB).
  - GDB: GNU Debugger is the most popular debugger for UNIX systems to debug C and C++ programs.
- Vivado will help to automatically compile, link, and load your program.



## Design Flow of ARM-FPGA Integration







**Create New Project** 

**Create & Package New IP** 

**Create New Block Design** 

Add PS7 and other New IPs **Configure IPs Run Connection Automation Run Block Automation** Validate Design

**Create HDL Wrapper** 

**Generate Bitstream** 

**Export Hardware to SDK** 



Add IP Repository into SDK

**Create Board Support Package** 

**Create New 'C' Application** 

Write 'C' Code

**Build the Application (auto)** 

**Configure FPGA** 

**Run on Hardware (GDB)** 



## **Summary**



- Rapid Prototyping with Zynq
- Rapid Prototyping (I): Integration of ARM and FPGA
  - Case Study: Lab 08 Software Stopwatch
    - IP Block Design (Xilinx Vavido)
      - ① IP Block Creation & AXI Interfacing
      - ② IP Integration
      - ③ HDL Wrapper
      - ④ Generate Bitstream
    - ARM Programming (Xilinx SDK)
      - S ARM Programming
      - ⑥ Launch on Hardware